Evaluation of the Global Lung Function Initiative 2012 reference values for spirometry in an Iranian population

Spirometry is an important measurement in detecting and monitoring of chronic obstructive pulmonary disease. The validity of the multi-ethnic Global Lung Function Initiative 2012 (GLI-2012) spirometric norms have been debated in some countries. The aim of the present study was to evaluate the applicability of the GLI reference norms in the Iranian population. A cross-sectional study was performed on 622 healthy non-smoker population (204 males and 418 females, age range: 4 ± 82 years) between July 16 and August 27, 2019 in Iran. Z-scores for spirometric data [FEV1 (forced expiratory volume in 1 s), FVC (forced vital capacity) FEV1/FVC, and FEF25–75% (forced expiratory flow averaged over the middle portion of FVC)] were calculated. According to the agreement approved, a mean Z-score outside the range of ± 0.5 was considered clinically significant. The mean (SD) Z-score values of FEV1, FVC, FEV1/FVC and FEF25–75% were 0.44 (1.21), 0.49 (1.14), 0.11 (1.03), and − 1.13 (0.99) in males and 0.61 (1.14), 0.89 (1.26), 0.17 (0.88) and − 0.49 (0.96) in females, respectively. The Z-score of FEV1/FVC was below the lower limit of normal (LLN) in 3.43% of men and 2.01% of women (in ≥ 21 years), while these values were significantly higher in people under 21 years old (46.2% in boys and 40.0% in girls). The GLI reference values are not perfect for the Iranian population, especially in children below 10 years old. The use of the GLI reference values was appropriate in population above 21 years; however, they would overestimate the prevalence of airway obstruction in individuals below 21 years.

Spirometry is a pivotal screening test for the diagnosis of patients with obstructive lung disease. The main spirometry values include forced vital capacity (FVC), and forced expiratory volume (FEV1) 1,2 . These measurements are generally compared with the percentage predicted values. The predicted data are acquired from a healthy non-smoker standard population 3,4 . However, the predicted normal data change widely from different sources leading to biased results [5][6][7] . This bias can be avoided by using the sex, age, height, and ethnicity-specific Z-score 8 .
In 2012, the Global Lung Initiative (GLI-2012) released spirometric norms derived from data collected from 72031 healthy individuals aged 3-95 years 8 . The GLI-2012 equations provided sex, age, height, and ethnic-specific reference equations as well as the lower limit of normal (LLN) values for spirometry.
In pulmonary function testing, the fifth percentile of all normal values (a Z-score of − 1.64) is defined as the lower limit of normal (LLN). Spirometry indices at the LLN would be observed in only 1 in 20 (5%) normal populations 9 .
The fit of the GLI-2012 norms has been tested and some countries approved them for their use to interpret the spirometry results, for example, in the Australasian 10 , Norwegian 1 , German 11 , and French 12 populations. On the other hand, the GLI-2012 norms seem unsuitable for clinical use in the Swedish 13 , Finnish 14 , Brazilian 15 , Malaysian 16 , and Chinese populations 17 . Some countries, including Iran, have not standardized the GLI-2012 equations 8 , although external validation of the GLI-2012 norms is recommended 7,8 . Moreover, the applicability of the norms should be evaluated in other parts of the world to verify their suitability in these regions. No study has evaluated the applicability of the GLI-2012 norms in the Iranian population.
The present study aimed to evaluate whether the GLI reference values apply to the Iranian population. Source population and sampling. The source population of this study was recruited from Tehran population presenting to health houses affiliated with Tehran Municipality. The sampling method was the randomized clustering method, and at least two health houses in each municipality district were selected (44 centers). The individuals who presented to health centers received an explanation about the study; then, they were asked to refer family members (3-95 years) if they were willing to join. In the second step, a short meeting was held with the household members or the household head in relation to the purpose of the study. Then, all population who wished to participate in the study were interviewed and screened. Written and informed consent was obtained from all participants. Satisfied healthy non-smokers (current or past smokers defined as those who had smoked at least 100 cigarettes in life and/or had a previous lifetime exposure of > one pack-year of smoking) and participants without a history of any current airway or lung disease (breathlessness, cough, wheeze, ischemic heart disease, and rheumatic disorders) were included in the study. The subjects who were not eligible for a baseline spirometry test and those who reported respiratory symptoms including a cough, sputum, rhinorrhea, etc. within seven days prior to the examination were excluded from the study.
Sex (female, male), age (one decimal point), height (with a precision of 0.5 cm without shoes using an accurate stadiometer), weight (with a precision of 0.5 kg measured without a jacket, bag, veil (in women), wristwatch, and with empty pockets) were recorded. Spirometric measurements. The advanced Spirobank II device (MIR, Rome, Italy) used in this study, and FEV 1 , FVC, FEV 1 /FVC, and FEF 25-75% (forced expiratory flow averaged over the middle portion of FVC) were measured.
The spirometers were calibrated every morning and a minimum of three and a maximum of eight measurements were performed per subject. The measurements were made without the use of bronchodilators according to the American Thoracic Society/European Respiratory Society (ATS/ERS) recommendations 3 . The repeatability criterion was < 5% deviation from the second-highest value. From the three selected large values that were within 150 ml of each other, the largest measurement was chosen as the best.
Quality control. The spirometry software provided feedback on the acceptability of the technique and repeatability. For spirometry according to the HUNT3/YoungHUNT3, curves were graded as A-F partly in line with a recent study by Hankinson et al. 18 . All curves graded as A-C were included in the study, i.e., at least two acceptable blows with a less than 150 mL difference. The inter-and intra-observer agreements showed excellent results.
Sample size. According to the ERS/Global Lung Function Initiative (GLFI), representative samples of at least 300 subjects can be used for validation in groups not covered by the GLI equations 8 . Six hundred females and 300 males in different age groups (4-82 years old) were selected. The sample size for any center was calculated based on the proportion of population per regional municipality. Finally, by removing samples with grades D and F 622 subjects (204 males and 418 females) were fully eligible to enter the study.
Analysis. Using the Excel macro for GLI 8 , reference values, lower limits of normal (LLN), Z-scores, and percentiles for FEV 1 , FVC, FEV 1 /FVC, and FEF 25-75% were calculated for each subject in the population. If the agreement between the observed values in the reference population and the GLI reference values is perfect, the mean Z-scores should ideally be zero, and the standard deviation (SD) should be one. According to the agreement reached by the GLI team and other studies validating these spirometric reference equations (SRE), a mean Z-score outside the range of ± 0.5 is considered clinically significant, corresponding to at least 5-6% difference in the specified lung function measurement 8,13,16,19,20 .
The mean values and standard deviations were calculated, and Z-score curve plots were drawn. Possible relationships between Z-scores and age, height, weight, and sex were examined using multiple linear regression models. If the GLI reference values are applicable, no such relationships exist.
LLN was defined as the lower fifth percentile in the distribution from which the GLI reference values are derived, as calculated by the GLI Excel macro, if not explicitly stated otherwise. The 90% limits of normality, which are expected to include 90% of the observations if the agreement is perfect, were defined as observations with GLI Z-scores within the − 1.645 to + 1.645 range 8 .  Table 1.

Ethical approval and consent to participate.
The Caucasian GLI-2012 was applied to our population. Overall, the mean Z-scores of FEV 1 , FVC, and the FEV 1 /FVC ratio for males and females in the various age groups were higher than the Caucasian predicted values (range: 0.01 to 1.05) except for the FEV 1 /FVC in the age group under 21 years (range: − 1.11 to − 0.09).
The normal distribution curves of FEV 1 , FVC, and FEV 1 /FVC based on observed GLI Z-score mean and standard deviation values in men and women are presented in Fig. 1a-f.
The distribution of the Z-scores of FEV 1 , FVC, FEV 1 /FVC, and FEF 25-75% stratified by sex and age in the reference sample of healthy individuals is shown in Table 2.
The FEV 1 Z-score was smaller than 0.5 in men and women aged 10-21, 22-29, and 30-39 years; however, its standard deviation was often above 1. Moreover, this value was below 0.5 in girls under 10 years old and men 40-49 years old and over 70. The FEV 1 Z-score was not different from zero (by one-sample t-test analysis) in the age groups 10-29 and over 70 years in both genders (P > 0.05) ( Table 2).
The Z-score of FEV 1 /FVC was below 0.5 in all age groups except for the age group under 10 (both genders) and 60-69 (males) years ( Table 2).
In the age group over 21 years, the Z-score of FEV 1 /FVC was below the LLN in seven men (3.43%) and eight women (2.01%). However, these values were significantly higher in six boys (46.2%) and eight girls (40.0%) under 21 years old.
Z-score of FEV 1 /FVC less than LLN was zero in women over the age of 60 and in men aged 22-39 and 60-69 years. It was also less than 5% in women aged 50-59 and men aged 30-59 years (1-3.4%).
The mean Z-score of FEV 1 /FVC above the upper limit of normal (ULN, > 1.64) ranged between 0% (age group 22-29 years) and 25% (over 70 years) in men and between 0% (age group 10-21 and over 70 years) and 28.3% (age group 50-59 years) in women. www.nature.com/scientificreports/ The FEV 1 Z-score was within the 90% limits of normality (− 1.64 to + 1.64) in 81.3% of the observations (83.7% in males and 80.4% in females). The corresponding figure was 78.9% for FVC (84.5% in males and 76.1% in females) and 93.3% (90.2% in males and 94.8% in females) for the FEV 1 /FVC ratio in the age group over 21 years.
The Z-scores of FEV 1 , FVC, FEV 1 /FVC, and FEF 25-75% were analyzed according to age, height, weight, and gender using a linear regression model.
Age, weight, and height (but not gender) had an impact on the FEV 1 /FVC Z-score in univariate regression (P < 0.05). In multiple linear regression (in the presence of height, weight and age as variables with P < 0.2 in univariate linear regression) height and age remained associated (B-coefficient = 0.012 and 0.007; P = 0.008, P = 0.001 respectively) with the FEV 1 /FVC Z-score (Fig. 2a). There was a significant association between age and  www.nature.com/scientificreports/ FEV 1 Z-score (B-coefficient = 0.007; P = 0.011). In the multiple linear regression (in the presence of age, gender and height), age remained statistically significant (B-coefficient = 0.008; P = 0.006) (Fig. 2b).
There was a significant association between gender and FVC Z-score (B-coefficient = 0.389; P < 0.001) in univariate linear regression; this association was maintained in the multiple regression model too (B-coefficients = 0.346; P = 0.004) (Fig. 2c).
The prevalence of COPD defined by spirometry based on the fixed ratio (FR) criterion increased with age from 0.8% in the age group 30-39 years to 16.7% in the age group > 70 years (P for linear by linear association < 0.001). The prevalence of COPD according to the LLN criterion did not follow a specific trend (P for linear by linear association = 0.749). There was a 98.13% agreement between FR and LLN method (Fleiss' kappa coefficient = 0.58, P < 0.001). The prevalence of COPD based on FR and LLN according to age and sex is presented in Table 3.

Discussion
This study was the first study performed on 622 healthy non-smoker Iranian children and adults to evaluate the use of the GLI-2012 reference values to interpret FEV 1 , FVC, FEV 1 /FVC and FEF 25-75% .
When applying the GLI reference values to the reference population, the Z-scores were always closer to zero in men compared to women. The mean Z-score (SD) values of FEV 1 /FVC and FEF 25-75% (in both sexes) were reasonable, although not perfectly, normally distributed but not centered on 0 (0.11 (1.03) and 0.17 (0.88) for FEV 1 /FVC, and − 0.11 (0.99) and − 0.49 (0.96) for FEF 25-75% in males and females, respectively. Although the Z-scores of FEV 1 and FVC were below 0.5 in men, they were between 0.5 and 1 in women. In addition, the SD of all measurements was below 1.5. The Z-scores of spirometry indices of some countries 1,6,10-13,16,17 are shown in Table 4.
A number of studies have found that the use of the GLI equation was ideal in their population, including Spain where the Z-score (mean) of each parameter was close to 0 with a maximum variance of ± 0.5 21 . Other populations in which the use of GLI-2012 equation has been approved include Norway, Australia, Germany, and America (Asian-Americans) 1,10,11,17 . Although a review of the Z-score of spirometry values in Sweden showed www.nature.com/scientificreports/ that these values were relatively appropriate, the authors emphasized the lack of equation matching 13 . Moreover, although it was reported that the GLI equation could be used in the French population aged 40-65 years, the standard Z values appeared be relatively large in women (FEV 1 = 0.51, FVC = 1.3). Therefore, it seems that in approving or rejecting the fit of the equation with the source population, the authors' opinion also matters such that some studies were strict while some were not (Table 4). In general, the GLI equation did not fit the source population in studies conducted in Tunisia, China, Malesia, India, and Sweden. Comparison of the obtained values (FEV 1 , FVC, FEV 1 /FVC, and FEF 25-75% Z-scores) with the predicted values according to the age group confirmed the fit of lung function parameters (except FEV/FVC for men) for subjects aged 10-21, 22-29 and 70-84 years.
The FEV 1 /FVC Z-score was always under 0.5 in all age groups except males and females under 10 years and men aged 60-84 years.
A few studies tried to standardize the spirometric measurements in certain age groups including children and adolescents. Some of these studies confirmed the fit of GLI-2012 reference values in their community. However, some other studies emphasized that reference equations did not match the spirometric data in their children. In a study (2013) conducted in white, black, and South-Asian schoolchildren aged 5-11 years in London, GLI-2012 reference equations properly fitted spirometric data in white and black races. These values were fit for South Asian children based on the Southeast Asian equation 22 . Among 712 healthy urban-dwelling 7-13 year-old Zimbabwean schoolchildren, the mean GLI2012 Z-scores of FVC, FEV 1 , FEV 1 /FVC, and maximal-mid expiratory flow (MMEF) were measured using different ethnic GLI 2012 modules. The mean African-American GLI 2012 Z-scores were within 0.5 Z-scores from zero for all the spirometry variables; however, the Z-score SD for the FEV 1 /FVC ratio was ≥ 1, indicating more variability than the reference population, thus affecting the performance of the African-American GLI2012 LLN in this population 23 .
Chang et al. studied the age group 5-18 years in Taiwan in 2019 and provided evidence that the GLI-2012 reference equations did not properly match the spirometric data in the current Taiwanese pediatric population, indicating an urgent need for an update of the GLI reference values by the inclusion of more data of Table 3. Prevalence of COPD based on FR (FEV1/FVC < 0.70) and LLN (FEV1/FVC < LLN) criteria by age and sex (n (%)). FR fixed ratio (FEV1/FVC < 0.7), LLN lower limit of normal (FEV1/FVC < LLN).  www.nature.com/scientificreports/ non-Caucasian descent 24 . Moreover, in the study by Jones et al. in 2020, the equations currently in use in Brazil seemed to underestimate the lung function of Brazilian children aged 3-12 years 25 . The GLI-2012 reference values for spirometry were appropriate for healthy, well-nourished African school children in Angola, DR Congo, and Madagascar, but the lower limit of normal needed adjustment 20 .
In this study, lung function tests in 33 children (13 boys and 20 girls) aged 4-10 years were performed with acceptable quality of grades A to C. The FVC, FEV 1 , FEV 1 /FVC, and FEF 25-75% values were measured, and almost all values (except for FEV 1 in girls and FEF 25-75% in boys) were above or below the expected values (> + 0.5 or < − 0.5); however, the standard deviation was estimated to be less than one in all cases except for FEV 1 /FVC. Z-scores, FEV 1 /FVC and FEF 25-75% .
The percentage of the population with the FEV 1 /FVC value below LLN in the age group over 21 years (in both genders) was appropriate (3.43% in men and 2.1% in women) and was less than 5%. In the age group 10-21 years, the percentage of population with the FEV 1 /FVC value below LLN was relatively appropriate in men (5.03%) but was higher than expected in women (13.3%). These values were extremely high (42.4% in boys and 40% in girls) in the under 10 age group.
The Z-scores derived from the GLI-2012 SRE showed significant association with gender, age, and anthropometric variables. In multiple linear regression, height and age had positive associations with FEV 1 /FVC (B-coefficient = 0.01 and B-coefficient = 0.007), age had a positive association with FEV 1 (B-coefficient = 0.008), and gender had a relatively strong relationship with FVC (B-coefficient = 0.346) and FEF 25-75% (B-coefficient = 0.358).
Limitation of the study. The number of sample sizes in age groups under 10 years old (13 cases) in boys and over 70 years old (8 cases) in women was under 15 cases.

Conclusion
The GLI reference values are not perfect for the Iranian population, especially in children under 10 years old and females. However, the Z-score of FEV1/FVC matched the predicted value in almost all age groups (except children below 10 years of age). GLI reference values were appropriate for subjects over 21 years, but their use would overestimate the prevalence of airway obstruction in the Iranian population under 21 years. The results of linear regression models showed that age, height and, gender were crucial for establishing prediction equations of four spirometric measurements, i.e., FEV1/FVC, FVC, FEV1, FEF 25-75%.